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I. INTRODUCTION 



The genetic algorithm is an optimization procedure motivated by biological evolution and is 
successfully applied to optimization problems in different areas. A statistical mechanics model 
for its dynamics is proposed based on the parent-child fitness correlation of the genetic operators, 
making it applicable to general fitness landscapes. It is compared to a recent model based on a 
maximum entropy ansatz. Finally it is applied to modeling the dynamics of a genetic algorithm on 
the rugged fitness landscape of the NK model. 

PACS numbers: 05.50.+q, 87.10. +e, 07.05. Mh, 02.60.Pn 
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The idea of utilizing biological evolution Q as a metaphor for an optimization algorithm is not recent || - practical 
& • implementations, however, had to wait until computers of adequate speed were available [[| . Finally, recent advances 
in this technology jump-started a surge of evolutionary approaches to optimization problems in real world applications 
H . Applications to theoretical problems include the ground state search in some spin glasses || . Despite this renewed 
activity, the dynamics of these algorithms is not nearly as well understood as of other common optimization techniques, 
e.g., simulated annealing Q|. As one widely used representative of the biologically motivated optimization algorithms, 
the "genetic algorithm" bases its search on a set of search points ( a "population" in the biological picture). A 
dynamical rule constructs a new population of search points from it, in a way that on average its energy decreases (or 
"fitness" increases) with respect to the given optimization function. The dynamical operators include selection and 
reproduction of the fittest members, as well as mutation and recombination of members to generate new search points. 
One reason for the difficulty in modeling this algorithm is the non-gradient nature of the search space exploration 
£ — * allowing for non-local moves which complicates the treatment of its dynamics as a Markov chain |Q . 

A common approach to modeling physical systems with a large number of degrees of freedom is to find a few 
macroscopic variables that describe the average behavior of a system (e.g., temperature for a gas of a large number 
of atoms). In equilibrium systems, there are canonical procedures to describe these variables. In systems far from 
equilibrium, as are genetic algorithms, one can sometimes identify distributions that tend to become stationary under 
the dynamics. Recently, a theoretical approach to the dynamics of the evolving, finite size population of a genetic 
algorithm has been proposed that uses the fitness distribution of the population as the characteristic evolving quantity 
||. While this is successfully applied to selected, simple optimization problems 0, the method becomes difficult for 
problems that are more complex, since it depends on an intricate maximum likelihood estimation. This is used to 
describe the dynamics of the genetic operators in terms of a fitness distribution, where structural effects have to be 
averaged over and re-expressed in terms of fitnesses. In this study we will go back one step and look at the lowest order 
observables that determine the dynamics. In particular, we construct a simplified model based on the observation 
that genetic algorithm performance often correlates strongly with the parent-child fitness correlation. While using the 
selection scheme of || , the correlation is used to construct a dynamical model which is applied to a simple additive 
fitness function as well as the spin glass motivated NK model. This will show how well the concept of the correlation 
determining the dynamics actually holds in modeling the dynamics of a genetic algorithm, and where its limits are. 

The motivation for our model stems from the observation that, although the time evolution of a genetic algorithm 
is difficult to understand, practitioners often have a quite distinct intuition about when a genetic algorithm will work 
well. One common statement in this respect is that the algorithm performs best when the fitnesses of parents and 
their children are strongly correlated. It was found empirically that the performance of a genetic algorithm follows the 
fitness correlation of the genetic operators as well as the correlation length of the optimization landscape |lC[ |, In fact, 
designing suitable genetic representation schemes often means increasing the parent-child fitness correlation. Finally, 
in biological systems the correlation between parents and children is usually very large. What does this intuition 
mean in the light of existing models for genetic algorithm dynamics? To what extent are correlation measures able 
to predict the algorithm dynamics? These are exactly the questions that we will study in the following. 
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We will now first describe the simple version of a genetic algorithm, then review the correlation model and proceed 
with modeling the genetic operations on the basis of fitness correlations, first for mutation algorithms and finally for 
full genetic algorithms, including recombination. The study concludes with a comparison of the model to numerical 
studies of the corresponding algorithms. 

II. A DYNAMICAL MODEL OF THE BASIC GENETIC ALGORITHM 

A genetic algorithm performs population based optimization in a discrete search space. For our purposes we define 
a simple version of this algorithm. Let the function which is to be optimized be a real valued function f(S) on a 
binary search space representation S € {±1}^ of dimension N. Its value is the "fitness" of the test point S which is 
to be maximized (alternatively one could view —f(S) as an energy to be minimized). In the biological picture, S is 
the analogue to the genome. The algorithm starts from a random "population" of search points S a with a = 1, . . . , P 
forming a population of P strings with fitnesses f a — f(S a ). Subsequently, new search points are tested by means of 
three operations, called selection, mutation, and recombination. In the selection step, a new population is created: 
members are selected according to probabilities defined on the basis of their fitness values. Those with a higher fitness 
are more likely to "survive" than those with smaller fitness and the new population most likely has a higher average 
fitness than the old one. New search points are then created by flipping single spins with a small fixed probability 7 in 
all of the population, called the mutation step. Then pairs of strings are allowed to exchange a subset of their sites in a 
recombination step, analogous to the biological process of crossing over genomes. This procedure is iterated resulting 
in an evolution towards higher fitness values. It can be used to solve optimization problems where the problem is 
defined by means of a scalar fitness function f(S). The choice of the structure of the search space and the encoding 
of f(S) often determine the convergence properties and performance of the algorithm and is one major motivation 
for the modeling of genetic algorithm dynamics. 

In the following, the dynamics of the genetic operators will be studied in more detail. We will do this for two toy 
models, an additive fitness function on the one hand, and a more rugged function on the other. The first problem is 
the simplest additive function, a random field paramagnet 

N 

/a = J>sf + «s (i) 

with random couplings Ji taken from a Gaussian distribution with mean and variance 1. The N sites Sf with 
i = 1, . . . , N and S" = ±1 form the genetic string of the member a of the population. The second function will be 
the NK model fitness function jll) 

N 

fa = ^ Ej(S"; S£,..., Sj K ) (2) 
j=i 

with 2 K+1 random energy values Ei(S a ) drawn from a uniform distribution over the interval [0, 1] and a randomly 
chosen permutation of sites i\ to in, both for each i. Originally, this function has been formulated for the study of 
evolution on tunably rugged fitness landscapes with application to the evolution of the immune response jl2| . For 
these two functions, let us derive the dynamics under mutation and recombination. 

The dynamics of the model is described in terms of the fitness distribution p(f) of the population which is expressed 
as an expansion in cumulants as proposed in ||. The cumulants K n of p(f) are defined through 

(3) 

representing the mean, variance, skew (k^/k^), curtosis (K4/K2), and higher moments of the fitness distribution. 
To give an intuitive picture, the first two cumulants roughly capture the infinite population size limit of the model. 
The higher cumulants, skew and curtosis, are important to describe the dynamics of a finite population where, e.g., 
selection causes the fitness distribution to quickly become skewed and thus deviate from a Gaussian. An evolving 
population can, at each time step, be approximated by a set of these variables. Its dynamics can then be viewed in 
terms of the evolution of the cumulants. In the following, the dynamics of an evolving population will be modeled 
using a truncated expansion in the first four cumulants. The different operations of a genetic algorithm, selection, 
mutation, and recombination, interact in different ways with this representation. 
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First consider selection. We will follow the formalism of M for modeling the selection operation on a fitness 
distribution in terms of cumulants. Boltzmann selection is considered where a member with fitness f a is chosen from 
the population with the probability 



Pa = 



~Z~ 



z 



a = l 



where (3 parametrizes selection strength. After selection, the cumulants are given by 

s _ 



< InZ >, 



(4) 



(5) 



where the average is taken over all possible populations with individual fitnesses satisfying the given /?(/). This 
expression can be solved similarly to the random energy model Jl3j | , as shown in || . One obtains the cumulants after 
selection as functions of the cumulants before selection, either by means of numerical integrals or, in the limit of small 
selection (small /?), as an expansion. The cumulants after selection have been derived under the assumption that the 
new population is drawn from a continuous fitness distribution. Only the dominant finite-population effect is kept 
which originates from the stochastic sampling of the new population in the selection step. 

While selection is solely determined by the fitness distribution of the population, the other two operators act on 
the representation S instead. The average effect of the mutation operator for the random field paramagnet has been 
worked out in by averaging over all possible mutation events yielding to lowest order 



(fa)mut = m f a + (1 - m) K? 



with 



m = 1 — 27. 



(6) 



(7) 



Similarly, we derive for the NK-model the fitness of a mutated string by writing down (0) with each site Sf multiplied 
by a random af = ±1. The energy of a single Ei changes to another (randomly chosen) value if at least one of the 
sites is changed and remains unchanged otherwise. The average fitness of a string after mutation is then obtained in 
an annealed approximation as @ with 



m = (1 — 7) 



K+l 



(8) 



For both functions we can write the mean fitness k\ of the potential children of a parent with fitness f a as (g) 
with some function dependent constant m. In general terms, m is the fitness correlation of a genetic operator (here: 
mutation) with respect to a specific fitness landscape (here: /). The above observation motivates to use m as a 
measure of the lowest order genetic algorithm dynamics on general landscapes. Defined in terms of an average over 
the population and possible mutation events, m can also be expressed as 



a, mut if a if a )c 



if a) a 



(fa) a 



(9) 



It can thus be measured from a given fitness function. Here, it parametrizes the average fitness of a member after 
mutation (given the fitness before) and will be used below to give a lowest order approximation of the population 
dynamics. 

Let us check how the next order relates to this picture. The fitness variance of mutated members /™ l derived from 
a single parent with fitness f a has been calculated for the random field paramagnet in |1 as 



K 2 — ((fa ) ) mut (fa ) mut 
N 

= (l-m 2 ) £ Jf. 



i=l 



For comparison we obtain for the NK model by a similar calculation 



(1 — m 2 ) + m (1 ~ m ) 



N 



Ml 2 

N 



N 



«S (/a 



(10) 



(11) 
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Other that the first order terms, these expressions are not exactly of an equal type. However, looking for a lowest 
order rule to approximate the variance after mutation, let us consider the average over the class of allowed functions. 
Averaging over J resp. E for the two problems, properties of a particular realization drop out and one obtains 

(4) = (l-m 2 )««. (12) 

This is again a function of the fitness correlation to, motivating a model of the genetic algorithm dynamics. This 
corresponds to modeling the distribution of fitness values after mutation from a parent of fitness f a by the ansatz 

P(f m \f) = -±=r exp (13) 



a/27t4 V 2k 



^2 

with 

k\ = to f a + (1 — m) Ki 

4 = (1 - to 2 ) 4 (14) 

where k\ and 4 are the cumulants of the initial, random distribution. It reflects the empirical observations about 
genetic algorithm performance on correlated landscapes p0| . With this ansatz, the fitness distribution after mutation 
is predicted as 



k" 1 = to Ki + (1 — to) Ki 



•2 



m 2 K2 + (1 — to 2 ) 



k™ = to 3 K3 

K ™ = to 4 k 4 . (15) 

This is a lowest order model for mutation dynamics based on a given parent-child fitness correlation to. To compare 
this prediction with a direct calculation from the fitness functions, the distribution of the population after mutation 
is obtained by an additional average over the parents in all possible populations. Neglecting finite-population effects 
in the mutation step which are much smaller than those in selection, the first cumulant of the distribution of the 
population after mutation is then 

«f = to ki + (1 - to) k? (16) 

for the random field paramagnet as derived in Q. We obtain the same expression for the NK-model. The second 
order of the random field paramagnet has been derived as 



iV 



m 2 K2 



(1-to 2 )^J 2 . (17) 



In comparison we obtain for the the NK model 



m 2 K2 + (1 — to 2 ) 

+ TO (1 — to) 



iV 



(18) 



The full second-order expression cannot be derived from a knowledge of to alone, due to fluctuations in the third term 
of©. 

In general, one finds that the cumulants after mutation do not always depend on the pure cumulants before selection. 
The dynamics also depends on properties of the genetic coding since it directly acts on the underlying representation. 
If the fitness distribution is all one knows about a population, one has to make additional assumptions when modeling 
the dynamics in order to describe the underlying dynamics of the genetic variables correctly. The model developed 
in H utilizes a maximum entropy estimation for this purpose. Since this is a complicated method for general hard 
optimization problems, we here use a different approach by concentrating on a lowest order dynamical model of the 
basis of the correlation to. This method is more accessible for complicated fitness landscapes. In the example of the 
rugged NK landscape we find that the first two cumulants after mutation are well reproduced in terms of to with 
the fluctuations in (fL8f) being small. Therefore, we approximate the second cumulants of both above examples by an 
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expression as done in (|l5|). This set of equations derives the mutation cumulants from a model for a "microscopic" 
mutation event (|l^) on the basis of the fitness correlation of mutation applied to the landscape. Applying mutation 
to a member a of the population with initial fitness f a , the resulting member will in general have a different fitness 
value /™. The case of maximum correlation f™ — f a occurs where mutation does not affect the fitness of the child 
at all. In the other extreme of violent mutation, it involves a random change in fitness, leaving traces of the fitness 
distribution of a random population We parametrized the possible correlations in the range between these 

two extremes by approximating the fitness distribution of the child as (|l4|). In this way, the degree of correlation 
between parent and child under mutation is parametrized by m. The choice of the two distributions is natural since, 
in general, p° is the fixed point distribution of the mutation operator. On this basis, ( |l5| ) defines a closed expression 
for the fitness distribution of the population after mutation as a function of the distribution prior to mutation. It will 
serve as an iteration step in describing the complete dynamics. 

A similar approach as for mutation can be adopted for recombining genetic strings. In the simple genetic algorithm 
modeled here, the recombination operation is defined symmetrically in the two children produced. 

1. Group the individuals of a population into pairs of two. 

2. Recombine each pair, i.e., at each Bit position, swap the adjacent spins with a given probability a. 

3. For each pair, replace the parents by the two children thus produced. 

For the random field paramagnet model, according to the fitness of one child produced by recombination averaged 
over all possible recombination events is then 

(f c *e) cross = a fe + (!-«)/«• (is) 

Along similar lines, for the NK model, let us write the fitness of a child averaged over all possible recombination 
events as 



N K+l 'K + l\ fl r 



+ Ei(S a ) q n (1 - q K+1 - n ) + Ei{S ) q K+1 - n (1 - q n ) 



1 K+1 



+ \ (l-? n )(l-9* +1 - n ) j- (20) 

Here, n denotes the number of sites swapped between the arguments of one corresponding energy term Ei of the 
parents. The sum over n is followed by the probability of exactly n swapped sites in a set of K + 1 sites relevant for 
each energy term. The last term is the annealed average of a child's energy term Ei, where q is the average probability 
that two random sites S" and Sf are equal in the population. For both fitness functions, the post-recombination 
fitness can be cast into the unified expression 

{fap) cross = C *I3 /« + C /3a fp + (1 ~ C a0 ~ Cp a ) k\ (21) 

with c a p — a, C[j a = 1 — a for the random field paramagnet, and 

c afj = [l-a + aq] K+l -\q K+1 

c f3a = [a+(l-a)qf +1 - \ q K+1 (22) 



with k\ = N/2 for the NK model. The parameters c a p and cp a correspond to the average fitness correlations of 
child /?g with either one of its parents, f a or fp, after recombination: 

(f <*f ap) a^P, cross ~ ) a ( faff) a^/3,cross 



a 



Ca/3 



1 - p^r) «2 



_ (fpfap) a ^p ,cross .cross /oo\ 
CBa = 7 x ' ■ 



H. 2 



5 



Here, the averaging is done over all members in the population and, after recombination, over all possible pairings and 
all possible recombination events. From the post-recombination fitness (|2f| ) we can again derive the average fitness 
of the population after recombination by averaging over all potential parents, yielding 

K$ = CKl + (l-c) K? (24) 

with c = c Q( 3 + cp a . For the first order cumulants we find that both functions, the random paramagnet and the NK 
model, fall into the same model class. This will motivate us below to use the correlation c for a lowest order dynamical 
model of recombination. The higher moments can be derived in a similar fashion. We define the fitness variance of 
the population after recombination as 

a^{3 , cross at^0,cross 

(25) 

as the variance of an infinite size population of children derived from a finite parent population and averaged over all 
allowed recombination events and parental pairs. This has been calculated in Q for the random field paramagnet as 

K2 = K2- (26) 

In this case, recombination leaves mean and variance of fitness untouched. The third moment is defined as 

K 3 = (fZp/ajtPt cross ~ 3K 1 K 2 ~ K l ( 27 ) 

and, dropping spatial correlations, is given in as 

N 

nl = [a 3 + (1 - a) 3 ] K3 - 6a(l - a) £ J 3 {(S? > q - (S?S? S?)^} . (28) 

»=i 

One now obtains also terms containing spin correlations. In the model of || these are estimated in a maximum 
entropy estimation, summing over search space regions corresponding to a given fitness distribution of the population. 
As this approach becomes again impractical for real optimization problems with hard or unknown fitness functions, 
we here study the simpler approach to describe recombination dynamics on the basis of the fitness correlation c. We 
will see below that this works well in cases where the fluctuations from spin correlations remain small as in the case 
of the random field paramagnet. For comparison let us also consider recombination of the more difficult landscape of 
the NK model. For the variance we obtain 



(Cq/3 + C(3a) «2 + (1 - Cap ~ Cp a ) K 



+ (Cap + Cpa) (1 - C aj3 - Cp 



N 

+ E E { ( c '/3 + c k ~ cap - cp a ) (Ef Ef) q + 2 C a0 Cp a (E? E?) . (29) 

i=l j^i 

This is now a different situation than before: The third term contains large fluctuations from correlations between 
energy terms within strings and within the population. In this case there is no strong limit of vanishing spatial 
correlations, instead (E?E*) a - (E9 c ) a (Ef) a ± for i ^ j, such that (E?Ef) a w (E?E?) is only weakly 
fulfilled. This is due to each energy term Ei being coupled to neighboring spins in the string. When running a real 
genetic algorithm with a = 1/2 (as often used) one can observe the fluctuations with magnitudes comparable to the 
leading terms and of either signs. The very last term contributes also in the limit of large correlations c a p and cp a 
where the preceding term is suppressed. Here, any simple approximation breaks down, as does our intuition about 
a correlation governing the evolution. In fact, one is lead to consider that it might not be a question of a working 
description, rather than the issue of whether recombination helps at all in this limit. It clearly is disruptive here 
resulting in low correlation, a limit never encountered in biological evolution. For the NK model we will therefore 
consider here the less disruptive case of asymmetric recombination, resulting in better genetic algorithm performance, 
let us choose a = 1/2N. In this limit c a p 3> cp a and the variance can now be written in the simple form 

^=ck 2 + (1-c)4+c(1-c) 1(1-^)4-^2 J2 ( E ?E<*) a + ± (ki- K ?) 2 1 (30) 

»=1 jjti J 



G 



with c = c a j3 + cp a . The remaining correlation is now balanced with (l — -^-) n\ and the difference suppressed by 
(1 — c). In this case, the lowest order model is 

K C 2 =CK 2 + (\-c) K%. (31) 

Now having the next to lowest order behavior of the two functions at hand, with the identical lowest order term ((2^) , 
let us again use this as a motivation for a dynamical model based on c, as done before in the case of mutation. For 
this purpose we approximate the distribution after recombination by the conditional probability density 

^ ! '" M '^^{ JI ^) <32 » 

with suitable moments and k 2 '■ The mean fitness of the children of a given pair of parents is 

Kf = C at3 f a + Cp a fp + (1 - c) K ° (33) 

as motivated by ( ^l|) and matches the fitness correlation picture as for mutation. Dealing with two parents with in 
general different fitness values, the recombination event introduces also a variance. Let us first consider the random 
field paramagnet. Here one finds 

N 

(fa}l) cross ~ (faf}) cross = X] J i i 1 ~ S i S i )- ( 34 ) 

i=l 

Therefore, and since spatial correlations vanish here, the variance of the distribution of potential children can be 
modeled by 

= (c aP f a - c &a fpf + (1 - c 2 ) k° 2 . (35) 

With this assumption, and with c Q ^ = Cf} a for the random field paramagnet, the fitness distribution after recombina- 
tion p(fap) is predicted as 

k\ = C K\ + (l — c) Ki 

H C 2 = C 2 H 2 + (1 - C 2 ) K9, 

k 3 . (36) 



c 



The mean and variance of the population after recombination are therefore correctly predicted for the random field 
paramagnet. The higher orders are off by a constant factor, they cannot be matched exactly within the second order 
correlation model, which would require higher moments nf and k". The fixed point distribution for recombination 
(which does not equal that of a random population as for mutation) is small enough here to allow for neglecting the 
fluctuations in ( pq ) and higher moments. We will use this set of cumulants for the numerical model below. 

What happens in the case of the NK model? Here, the exchange of spins between the genomes has rather the effect 
of mutations than the sharing of knowledge between them. Using the same arguments as in the mutation model ([IT 
we choose: 

K? = c «/3 fa + Cj3 a fp + (l- c) K° 

K*2 - (1 - C 2 ) K° 2 . (37) 

The model then predicts in the asymmetric case of c « c a p a post-recombination population distributed according to 
(|l5|), while the direct calculation suggests 

< = C K n + (1 - c) K° . (38) 

The correlation model, therefore, correctly predicts the mean fitness of the population after recombination, however, 
deviates in the higher orders. It still is numerically close to the direct calculation model which will be simulated up to 
n — 4 below. An alternative microscopic model of recombination that predicts all leading orders to be linear in c was 
proposed in |p^| , however, does not improve the case under consideration here. The sets of cumulants from selection 

, mutation k™, and recombination k^, now define the iteration step of one "generation" of the dynamical model, 
with the operations being applied in this order. 
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III. NUMERICAL COMPARISON OF THE MODEL TO A GENETIC ALGORITHM 



In numerical simulations the correlation model is now be compared to the evolution of a real genetic algorithm. 
The fitness distribution of the initial, random population for the random field paramagnet function is given by 



K l — {fa) a , S.J ~ 
K° 3 =0 

= -6 iV 2 -p (l - -p) (l — ~) — 6 AT (l — -p) (l — -p + -pz) . (39) 

Since the average dynamics of a whole class of functions is considered here, the last value differs from the pure 
ensemble average (. . .) s through the additional average over all possible functions (. . .) j. The corresponding results 
for the NK model are 



a,S,E 



N 

y 



<£>«-</«>;)„=(!-*) (1-aAr) £ (40) 



and, omitting terms of orders 2 



-K 



«S = -(1-*) + (1-^) £• (41) 

The simulation results are averaged over 10000 runs of a genetic algorithm with population size P = 50 and selection 
strength /3 S = 0.01 (with a newly chosen random fitness function for each run). The size of the genetic string is 
N = 128 sites and the mutation probability for each site is 7 = 1/2 AT. In Fig. [j], the iterated cumulant expansion 
is compared to the dynamics of a genetic algorithm for the random field paramagnet with selection and mutation. 
The solid curves show mean and variance of the genetic algorithm fitness distribution and are well described by the 
theoretical approach shown by the dashed curves. The theoretical model is based on the constant correlation value m. 
Although the correlation among genotypes in the population considerably changes over time, the fitness correlation 
m remains in fact constant when measured in the population over time. Here, the correlation m appears to contain 
the basic information about the dynamics. In Fig. |, the evolution of the NK model fitness distribution for selection 
and mutation is shown for a model with P = 50, A" = 128, and K = 8. Again, the solid curves show the mean and 
variance of the measured genetic algorithm fitness distribution. The correlation m is taken as derived above for the 
NK function. All other parameters are chosen as in the previous case. For the plot, n\ is depicted as k.\ — kJ. As the 
figure shows, the evolution of the genetic algorithm is predicted correctly also on the rugged fitness landscape of the 
NK model. The model yields a satisfactory prediction, especially when comparing the very simple correlation model 
to the maximum entropy model of Q . 

Adding the recombination step to the simulation of the random field paramagnet, the modeling based on parent-child 
fitness correlations is shown in Fig. |^. Crossover has been defined to be "uniform" , where each site is swapped with 
probability a = 0.5 between the parents and both resulting children are taken. Here, the model uses the theoretical 
value of c = 1 while the remaining parameters of the simulations and the model are chosen as above. In the real 
genetic algorithm applied here to the random field paramagnet problem, recombination improves the performance as 
compared to Fig. [j]. This is correctly predicted by the correlation model. 

While for the random field paramagnet we saw that c remains constant over the course of evolution, for the NK 
model c depends on the probability q of two equal spins meeting in a recombination event. This is a quantity that 
cannot be expressed in terms of the fitness distribution alone. For the purpose of the numerical comparison we will 
estimate this probability from the average pair correlation in the population {q a f3) a ^ l3 with 



la/3 

such that 



1 N 

^E^f (42) 

i=l 



N 



ff = i±M. (43) 
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The average pair correlation is measured from the genetic algorithm runs and used to correct for the running q in 
the numerical model. On the other hand, a closed model could easily be obtained by including q a p as a dynamical 
variable into the models as proposed in [jl5). The result is shown in Fig. [| where the evolution of the NK- model fitness 
distribution for selection and recombination is shown using a selection strength S = 0.01 and asymmetric crossover 
with a = 1/2N (again, K\ — n\ is plotted and all other parameters are chosen as above). While the main dynamics is 
captured by the model, neglecting the spatial correlations shows here in smaller accuracy after a few tens of iterations 
of the model, as compared to the previous cases. This points at the limits of the present correlation model when 
compared with the model in |J explicitly calculating fluctuations from spin correlations. However, the simplicity of 
the correlation model makes it applicable to fitness landscapes where maximum entropy calculations are not feasible. 
Finally, the genetic algorithm run demonstrates that recombination is no guarantee for improved optimization as long 
as the encoding does not reward with improved correlation. 



IV. CONCLUSIONS 



A dynamical model for the mutation and recombination operators of genetic algorithms has been developed, based 
on a simple correlation measure. The motivation was the common intuition that the fitness correlation between 
parents and children is a measure for the convergence properties of genetic algorithms. The correlation determines 
the model for a microscopic mutation and recombination event, which is used as an input for a dynamical formalism 
of genetic algorithms. For two test functions, an additive, random field paramagnet and the spin glass motivated 
NK-model, the dynamics of a genetic algorithm has been modeled and compared to the average dynamics of an 
ensemble of real genetic algorithm runs. 

Three main results can be summarized from this study. First, the correlation model helped us in understanding the 
improved genetic algorithm performance on correlated landscapes. Second, we obtained a simple model for genetic 
algorithm dynamics on the basis of fitness correlations. A comparison to a more involved maximum entropy model 
P| demonstrated that for the cases considered main features of the dynamics are already contained in the fitness 
correlations of the genetic operators. This gives a simple model at hand for fitness landscapes where maximum 
entropy calculations are not feasible as in many practical applications. Third, we demonstrated a working model of 
genetic algorithm dynamics on a hard optimization problem, the rugged fitness landscape of the NK model. 

A further goal of this study was to link fitness correlation measures, which are often used as empirical measures 
for genetic algorithm performance, to dynamical models of genetic algorithms. This touches the issue of choosing the 
right algorithm for a given problem and the question which probes might help in this decision |l6| . Fitness correlation 
measures are among the candidates for such probes. 
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FIGURES 



FIG. 1. Measured evolution (solid lines) and predicted evolution (dashed lines) of Ki and «2 for a random field paramagnet 
fitness under selection and mutation. 



FIG. 2. Measured and predicted evolution of k\ — k? and K2 for the NK-model fitness under selection and mutation. 

FIG. 3. Measured and predicted evolution of ki and K2 for the random field paramagnet fitness under selection, mutation, 
and recombination. 

FIG. 4. Measured and predicted evolution of ki — k° and K2 for the NK-model fitness under selection, mutation, and 
recombination. 
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